You are viewing the RapidMiner Studio documentation for version 10.2 - Check here for latest version
 Split File by Point
						(Text Processing)
Split File by Point
						(Text Processing)
					
	
		
		
		Synopsis
Segments documents by defining the splitting point.Description
Operator that allows to extract segments from a set of text documents in a directory based on a splitting the single documents into parts. The split point is described by a regular expression.
Input
 through (File) through (File)- The through port. 
Output
 through (File) through (File)- The through port. 
Parameters
- previewShows a preview for the results which will be achieved by the current configuration. Range:
- textsA directory containing the documents to be segmented Range:
- output_directoryThe directory to which to write the segments Range:
- split_expressionSpecifies the split points in the documents using a regular expression. For example splits on every line break. Range:
- use_file_extension_as_typeIf checked, the type of the files will be determined by their extensions. Unknown extensions will be treated as text files. Range:
- content_typeThe content type of the input texts Range:
- encodingThe encoding used for reading or writing files. Range:
